<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Chapter 1 Introduction | Data visualisation using R, for researchers who don’t use R</title>
<meta name="author" content="Emily Nordmann, Phil McAleer, Wilhelmiina Toivo, Helena Paterson, Lisa DeBruine">
<meta name="description" content="Use of the programming language R (R Core Team, 2022) for data processing and statistical analysis by researchers is increasingly common, with an average yearly growth of 87% in the number of...">
<meta name="generator" content="bookdown 0.25 with bs4_book()">
<meta property="og:title" content="Chapter 1 Introduction | Data visualisation using R, for researchers who don’t use R">
<meta property="og:type" content="book">
<meta property="og:url" content="https://psyteachr.github.io/introdataviz/introduction.html">
<meta property="og:image" content="https://psyteachr.github.io/introdataviz/images/logos/logo.png">
<meta property="og:description" content="Use of the programming language R (R Core Team, 2022) for data processing and statistical analysis by researchers is increasingly common, with an average yearly growth of 87% in the number of...">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="Chapter 1 Introduction | Data visualisation using R, for researchers who don’t use R">
<meta name="twitter:description" content="Use of the programming language R (R Core Team, 2022) for data processing and statistical analysis by researchers is increasingly common, with an average yearly growth of 87% in the number of...">
<meta name="twitter:image" content="https://psyteachr.github.io/introdataviz/images/logos/logo.png">
<!-- JS --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://kit.fontawesome.com/6ecbd6c532.js" crossorigin="anonymous"></script><script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="libs/bootstrap-4.6.0/bootstrap.min.css" rel="stylesheet">
<script src="libs/bootstrap-4.6.0/bootstrap.bundle.min.js"></script><script src="libs/bs3compat-0.3.1/transition.js"></script><script src="libs/bs3compat-0.3.1/tabs.js"></script><script src="libs/bs3compat-0.3.1/bs3compat.js"></script><link href="libs/bs4_book-1.0.0/bs4_book.css" rel="stylesheet">
<script src="libs/bs4_book-1.0.0/bs4_book.js"></script><!-- Global site tag (gtag.js) - Google Analytics --><script async src="https://www.googletagmanager.com/gtag/js?id=G-6NP3MF25W3"></script><script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());

      gtag('config', 'G-6NP3MF25W3');
    </script><script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- CSS --><style type="text/css">
    
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
  </style>
<style type="text/css">
    /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
    div.csl-bib-body { }
    div.csl-entry {
      clear: both;
        }
    .hanging div.csl-entry {
      margin-left:2em;
      text-indent:-2em;
    }
    div.csl-left-margin {
      min-width:2em;
      float:left;
    }
    div.csl-right-inline {
      margin-left:2em;
      padding-left:1em;
    }
    div.csl-indent {
      margin-left: 2em;
    }
  </style>
<link rel="stylesheet" href="include/psyteachr.css">
<link rel="stylesheet" href="include/webex.css">
<link rel="stylesheet" href="include/style.css">
</head>
<body data-spy="scroll" data-target="#toc">

<div class="container-fluid">
<div class="row">
  <header class="col-sm-12 col-lg-3 sidebar sidebar-book"><a class="sr-only sr-only-focusable" href="#content">Skip to main content</a>

    <div class="d-flex align-items-start justify-content-between">
      <h1>
        <a href="index.html" title="">Data visualisation using R, for researchers who don’t use R</a>
      </h1>
      <button class="btn btn-outline-primary d-lg-none ml-2 mt-1" type="button" data-toggle="collapse" data-target="#main-nav" aria-expanded="true" aria-controls="main-nav"><i class="fas fa-bars"></i><span class="sr-only">Show table of contents</span></button>
    </div>

    <div id="main-nav" class="collapse-lg">
      <form role="search">
        <input id="search" class="form-control" type="search" placeholder="Search" aria-label="Search">
</form>

      <nav aria-label="Table of contents"><h2>Table of contents</h2>
        <ul class="book-toc list-unstyled">
<li><a class="" href="index.html">Overview</a></li>
<li><a class="active" href="introduction.html"><span class="header-section-number">1</span> Introduction</a></li>
<li><a class="" href="getting-started.html"><span class="header-section-number">2</span> Getting Started</a></li>
<li><a class="" href="transforming-data.html"><span class="header-section-number">3</span> Transforming Data</a></li>
<li><a class="" href="representing-summary-statistics.html"><span class="header-section-number">4</span> Representing Summary Statistics</a></li>
<li><a class="" href="multi-part-plots.html"><span class="header-section-number">5</span> Multi-part Plots</a></li>
<li><a class="" href="advanced-plots.html"><span class="header-section-number">6</span> Advanced Plots</a></li>
<li><a class="" href="conclusion.html"><span class="header-section-number">7</span> Conclusion</a></li>
<li class="book-part">Appendices</li>
<li><a class="" href="additional-resources.html"><span class="header-section-number">A</span> Additional resources</a></li>
<li><a class="" href="additional-customisation-options.html"><span class="header-section-number">B</span> Additional customisation options</a></li>
<li><a class="" href="plotstyle.html"><span class="header-section-number">C</span> Styling Plots</a></li>
<li><a class="" href="advanced-plots-1.html"><span class="header-section-number">D</span> Advanced Plots</a></li>
<li><a class="" href="license.html">License</a></li>
<li><a class="" href="references.html">References</a></li>
</ul>

        <div class="book-extra">
          <p><a id="book-repo" href="https://github.com/psyteachr/introdataviz">View book source <i class="fab fa-github"></i></a></p>
        </div>
      </nav>
</div>
  </header><main class="col-sm-12 col-md-9 col-lg-7" id="content"><div id="introduction" class="section level1" number="1">
<h1>
<span class="header-section-number">1</span> Introduction<a class="anchor" aria-label="anchor" href="#introduction"><i class="fas fa-link"></i></a>
</h1>
<p>Use of the programming language R <span class="citation">(<a href="references.html#ref-R-base" role="doc-biblioref">R Core Team, 2022</a>)</span> for data processing and statistical analysis by researchers is increasingly common, with an average yearly growth of 87% in the number of citations of the R Core Team between 2006-2018 <span class="citation">(<a href="references.html#ref-barrett2019six" role="doc-biblioref">Barrett, 2019</a>)</span>. In addition to benefiting reproducibility and transparency, one of the advantages of using R is that researchers have a much larger range of fully customisable data visualisation options than are typically available in point-and-click software, due to the open-source nature of R. These visualisation options not only look attractive, but can increase transparency about the distribution of the underlying data rather than relying on commonly used visualisations of aggregations such as bar charts of means <span class="citation">(<a href="references.html#ref-newman2012bar" role="doc-biblioref">Newman &amp; Scholl, 2012</a>)</span>.</p>
<p>Yet, the benefits of using R are obscured for many researchers by the perception that coding skills are difficult to learn <span class="citation">(<a href="references.html#ref-robins2003learning" role="doc-biblioref">Robins et al., 2003</a>)</span>. Coupled with this, only a minority of psychology programmes currently teach coding skills <span class="citation">(<a href="references.html#ref-rminr" role="doc-biblioref">Wills, n.d.</a>)</span> with the majority of both undergraduate and postgraduate courses using proprietary point-and-click software such as SAS, SPSS or Microsoft Excel. While the sophisticated use of proprietary software often necessitates the use of computational thinking skills akin to coding (for instance SPSS scripts or formulas in Excel), we have found that many researchers do not perceive that they already have introductory coding skills. In the following tutorial we intend to change that perception by showing how experienced researchers can redevelop their existing computational skills to utilise the powerful data visualisation tools offered by R.</p>
<p>In this tutorial we provide a practical introduction to data visualisation using R, specifically aimed at researchers who have little to no prior experience of using R. First we detail the rationale for using R for data visualisation and introduce the "grammar of graphics" that underlies data visualisation using the <code>ggplot2</code> package. The tutorial then walks the reader through how to replicate plots that are commonly available in point-and-click software such as histograms and boxplots, as well as showing how the code for these "basic" plots can be easily extended to less commonly available options such as violin-boxplots.</p>
<div id="why-r-for-data-visualisation" class="section level2" number="1.1">
<h2>
<span class="header-section-number">1.1</span> Why R for data visualisation?<a class="anchor" aria-label="anchor" href="#why-r-for-data-visualisation"><i class="fas fa-link"></i></a>
</h2>
<p>Data visualisation benefits from the same advantages as statistical analysis when writing code rather than using point-and-click software -- reproducibility and transparency. The need for psychological researchers to work in reproducible ways has been well-documented and discussed in response to the replication crisis <span class="citation">(e.g. <a href="references.html#ref-munafo2017manifesto" role="doc-biblioref">Munafò et al., 2017</a>)</span> and we will not repeat those arguments here. However, there is an additional benefit to reproducibility that is less frequently acknowledged compared to the loftier goals of improving psychological science: if you write code to produce your plots, you can reuse and adapt that code in the future rather than starting from scratch each time.</p>
<p>In addition to the benefits of reproducibility, using R for data visualisation gives the researcher almost total control over each element of the plot. Whilst this flexibility can seem daunting at first, the ability to write reusable code recipes (and use recipes created by others) is highly advantageous. The level of customisation and the professional outputs available using R has, for instance, lead news outlets such as the BBC <span class="citation">(<a href="references.html#ref-BBC-R" role="doc-biblioref">Visual &amp; Journalism, 2019</a>)</span> and the New York Times <span class="citation">(<a href="references.html#ref-NYT-R" role="doc-biblioref">Bertini &amp; Stefaner, 2015</a>)</span> to adopt R as their preferred data visualisation tool.</p>
</div>
<div id="a-layered-grammar-of-graphics" class="section level2" number="1.2">
<h2>
<span class="header-section-number">1.2</span> A layered grammar of graphics<a class="anchor" aria-label="anchor" href="#a-layered-grammar-of-graphics"><i class="fas fa-link"></i></a>
</h2>
<p>There are multiple approaches to data visualisation in R; in this paper we use the popular package<a class="footnote-ref" tabindex="0" data-toggle="popover" data-content="&lt;p&gt;The power of R is that it is extendable and open source - put simply, if a function doesn't exist or is difficult to use, anyone can create a new &lt;strong&gt;package&lt;/strong&gt; that contains data and code to allow you to perform new tasks. You may find it helpful to think of packages as additional apps that you need to download separately to extend the functionality beyond what comes with &quot;Base R&quot;.&lt;/p&gt;"><sup>1</sup></a> <code>ggplot2</code> <span class="citation">(<a href="references.html#ref-ggplot2" role="doc-biblioref">Wickham, 2016</a>)</span> which is part of the larger <code>tidyverse</code><a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;Because there are so many different ways to achieve the same thing in R, when Googling for help with R, it is useful to append the name of the package or approach you are using, e.g., "how to make a histogram ggplot2".&lt;/p&gt;'><sup>2</sup></a> <span class="citation">(<a href="references.html#ref-tidyverse" role="doc-biblioref">Wickham, 2017</a>)</span> collection of packages that provide functions for data wrangling, descriptives, and visualisation. A grammar of graphics <span class="citation">(<a href="references.html#ref-wilkinson2005graph" role="doc-biblioref">Wilkinson et al., 2005</a>)</span> is a standardised way to describe the components of a graphic. <code>ggplot2</code> uses a layered grammar of graphics <span class="citation">(<a href="references.html#ref-wickham2010layered" role="doc-biblioref">Wickham, 2010</a>)</span>, in which plots are built up in a series of layers. It may be helpful to think about any picture as having multiple elements that sit semi-transparently over each other. A good analogy is old Disney movies where artists would create a background and then add moveable elements on top of the background via transparencies.</p>
<p>Figure <a href="introduction.html#fig:layers">1.1</a> displays the evolution of a simple scatterplot using this layered approach. First, the plot space is built (layer 1); the variables are specified (layer 2); the type of visualisation (known as a <code>geom</code>) that is desired for these variables is specified (layer 3) - in this case <code><a href="https://ggplot2.tidyverse.org/reference/geom_point.html">geom_point()</a></code> is called to visualise individual data points; a second geom is added to include a line of best fit (layer 4), the axis labels are edited for readability (layer 5), and finally, a theme is applied to change the overall appearance of the plot (layer 6).</p>
<div class="figure" style="text-align: center">
<span style="display:block;" id="fig:layers"></span>
<img src="01-ch1_files/figure-html/layers-1.png" alt="Evolution of a layered plot" width="80%"><p class="caption">
Figure 1.1: Evolution of a layered plot
</p>
</div>
<p>Importantly, each layer is independent and independently customisable. For example, the size, colour and position of each component can be adjusted, or one could, for example, remove the first geom (the data points) to only visualise the line of best fit, simply by removing the layer that draws the data points (Figure <a href="introduction.html#fig:remove-layer">1.2</a>). The use of layers makes it easy to build up complex plots step-by-step, and to adapt or extend plots from existing code.</p>
<div class="figure" style="text-align: center">
<span style="display:block;" id="fig:remove-layer"></span>
<img src="01-ch1_files/figure-html/remove-layer-1.png" alt="Plot with scatterplot layer removed." width="80%"><p class="caption">
Figure 1.2: Plot with scatterplot layer removed.
</p>
</div>
</div>
<div id="tutorial-components" class="section level2" number="1.3">
<h2>
<span class="header-section-number">1.3</span> Tutorial components<a class="anchor" aria-label="anchor" href="#tutorial-components"><i class="fas fa-link"></i></a>
</h2>
<p>This tutorial contains three components.</p>
<ol style="list-style-type: decimal">
<li>A traditional PDF manuscript that can easily be saved, printed, and cited.</li>
<li>An online version of the tutorial published at <a href="https://psyteachr.github.io/introdataviz/" class="uri">https://psyteachr.github.io/introdataviz/</a> that may be easier to copy and paste code from and that also provides the optional activity solutions as well as additional appendices, including code tutorials for advanced plots beyond the scope of this paper and links to additional resources.</li>
<li>An Open Science Framework repository published at <a href="https://osf.io/bj83f/" class="uri">https://osf.io/bj83f/</a> that contains the simulated dataset (see below), preprint, and R Markdown workbook.</li>
</ol>
</div>
<div id="simulated-dataset" class="section level2" number="1.4">
<h2>
<span class="header-section-number">1.4</span> Simulated dataset<a class="anchor" aria-label="anchor" href="#simulated-dataset"><i class="fas fa-link"></i></a>
</h2>
<p>For the purpose of this tutorial, we will use simulated data for a 2 x 2 mixed-design lexical decision task in which 100 participants must decide whether a presented word is a real word or a non-word. There are 100 rows (1 for each participant) and 7 variables:</p>
<ul>
<li>
<p>Participant information:</p>
<ul>
<li>
<code>id</code>: Participant ID</li>
<li>
<code>age</code>: Age</li>
</ul>
</li>
<li>
<p>1 between-subject independent variable (IV):</p>
<ul>
<li>
<code>language</code>: Language group (1 = monolingual, 2 = bilingual)</li>
</ul>
</li>
<li>
<p>4 columns for the 2 dependent variables (DVs) of RT and accuracy, crossed by the within-subject IV of condition:</p>
<ul>
<li>
<code>rt_word</code>: Reaction time (ms) for word trials</li>
<li>
<code>rt_nonword</code>: Reaction time (ms) for non-word trials</li>
<li>
<code>acc_word</code>: Accuracy for word trials</li>
<li>
<code>acc_nonword</code>: Accuracy for non-word trials</li>
</ul>
</li>
</ul>
<p>For newcomers to R, we would suggest working through this tutorial with the simulated dataset, then extending the code to your own datasets with a similar structure, and finally generalising the code to new structures and problems.</p>
</div>
<div id="setting-up-r-and-rstudio" class="section level2" number="1.5">
<h2>
<span class="header-section-number">1.5</span> Setting up R and RStudio<a class="anchor" aria-label="anchor" href="#setting-up-r-and-rstudio"><i class="fas fa-link"></i></a>
</h2>
<p>We strongly encourage the use of RStudio <span class="citation">(<a href="references.html#ref-RStudio" role="doc-biblioref">RStudio Team, 2021</a>)</span> to write code in R. R is the programming language whilst RStudio is an <em>integrated development environment</em> that makes working with R easier. More information on installing both R and RStudio can be found in the additional resources.</p>
<p>Projects are a useful way of keeping all your code, data, and output in one place. To create a new project, open RStudio and click <code>File - New Project - New Directory - New Project</code>. You will be prompted to give the project a name, and select a location for where to store the project on your computer. Once you have done this, click <code>Create Project</code>. Download the simulated dataset and code tutorial Rmd file from <a href="https://osf.io/bj83f/files/" target="_blank">the online materials</a> (<code>ldt_data.csv</code>, <code>workbook.Rmd</code>) and then move them to this folder. The files pane on the bottom right of RStudio should now display this folder and the files it contains - this is known as your <em>working directory</em> and it is where R will look for any data you wish to import and where it will save any output you create.</p>
<p>This tutorial will require you to use the packages in the <code>tidyverse</code> collection. Additionally, we will also require use of <code>patchwork</code>. To install these packages, copy and paste the below code into the console (the left hand pane) and press enter to execute the code.</p>
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span class="co"># only run in the console, never put this in a script </span>
<span class="va">package_list</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"tidyverse"</span>, <span class="st">"patchwork"</span><span class="op">)</span>
<span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html">install.packages</a></span><span class="op">(</span><span class="va">package_list</span><span class="op">)</span></code></pre></div>
<p>R Markdown is a dynamic format that allows you to combine text and code into one reproducible document. The R Markdown workbook available in the <a href="https://osf.io/bj83f/files/" target="_blank">online materials</a> contains all the code in this tutorial and there is more information and links to additional resources for how to use R Markdown for reproducible reports in the additional resources.</p>
<p>The reason that the above code is not included in the workbook is that every time you run the install command code it will install the latest version of the package. Leaving this code in your script can lead you to unintentionally install a package update you didn't want. For this reason, avoid including install code in any script or Markdown document.</p>
<p>For more information on how to use R with RStudio, please see the additional resources in the online appendices.</p>
</div>
<div id="preparing-your-data" class="section level2" number="1.6">
<h2>
<span class="header-section-number">1.6</span> Preparing your data<a class="anchor" aria-label="anchor" href="#preparing-your-data"><i class="fas fa-link"></i></a>
</h2>
<p>Before you start visualising your data, it must be in an appropriate format. These preparatory steps can all be dealt with reproducibly using R and the additional resources section points to extra tutorials for doing so. However, performing these types of tasks in R can require more sophisticated coding skills and the solutions and tools are dependent on the idiosyncrasies of each dataset. For this reason, in this tutorial we encourage the reader to complete data preparation steps using the method they are most comfortable with and to focus on the aim of data visualisation.</p>
<div id="data-format" class="section level3" number="1.6.1">
<h3>
<span class="header-section-number">1.6.1</span> Data format<a class="anchor" aria-label="anchor" href="#data-format"><i class="fas fa-link"></i></a>
</h3>
<p>The simulated lexical decision data is provided in a <code>csv</code> (comma-separated variable) file. Functions exist in R to read many other types of data files; the <code>rio</code> package's <code>import()</code> function can read most types of files. However, <code>csv</code> files avoids problems like Excel's insistence on mangling anything that even vaguely resembles a date. You may wish to export your data as a <code>csv</code> file that contains only the data you want to visualise, rather than a full, larger workbook. It is possible to clean almost any file reproducibly in R, however, as noted above, this can require higher level coding skills. For getting started with visualisation, we suggest removing summary rows or additional notes from any files you import so the file only contains the rows and columns of data you want to plot.</p>
</div>
<div id="variable-names" class="section level3" number="1.6.2">
<h3>
<span class="header-section-number">1.6.2</span> Variable names<a class="anchor" aria-label="anchor" href="#variable-names"><i class="fas fa-link"></i></a>
</h3>
<p>Ensuring that your variable names are consistent can make it much easier to work in R. We recommend using short but informative variable names, for example <code>rt_word</code> is preferred over <code>dv1_iv1</code> or <code>reaction_time_word_condition</code> because these are either hard to read or hard to type.</p>
<p>It is also helpful to have a consistent naming scheme, particularly for variable names that require more than one word. Two popular options are <code>CamelCase</code> where each new word begins with a capital letter, or <code>snake_case</code> where all letters are lower case and words are separated by an underscore. For the purposes of naming variables, avoid using any spaces in variable names (e.g., <code>rt word</code>) and consider the additional meaning of a separator beyond making the variable names easier to read. For example, <code>rt_word</code>, <code>rt_nonword</code>, <code>acc_word</code>, and <code>acc_nonword</code> all have the DV to the left of the separator and the level of the IV to the right. <code>rt_word_condition</code> on the other hand has two separators but only one of them is meaningful, making it more difficult to split variable names consistently. In this paper, we will use <code>snake_case</code> and lower case letters for all variable names so that we don't have to remember where to put the capital letters.</p>
<p>When working with your own data, you can rename columns in Excel, but the resources listed in the online appendices point to how to rename columns reproducibly with code.</p>
</div>
<div id="data-values" class="section level3" number="1.6.3">
<h3>
<span class="header-section-number">1.6.3</span> Data values<a class="anchor" aria-label="anchor" href="#data-values"><i class="fas fa-link"></i></a>
</h3>
<p>A benefit of R is that categorical data can be entered as text. In the tutorial dataset, language group is entered as 1 or 2, so that we can show you how to recode numeric values into factors with labels. However, we recommend recording meaningful labels rather than numbers from the beginning of data collection to avoid misinterpreting data due to coding errors. Note that values must match <em>exactly</em> in order to be considered in the same category and R is case sensitive, so "mono", "Mono", and "monolingual" would be classified as members of three separate categories.</p>
<p>Finally, importing data is more straightforward if cells that represent missing data are left empty rather than containing values like <code>NA</code>, <code>missing</code> or <code>999</code><a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;If your data use a missing value like &lt;code&gt;NA&lt;/code&gt; or &lt;code&gt;999&lt;/code&gt;, you can indicate this in the &lt;code&gt;na&lt;/code&gt; argument of &lt;code&gt;read_csv()&lt;/code&gt; when you read in your data. For example, &lt;code&gt;read_csv("data.csv", na = c("", "NA", 999))&lt;/code&gt; allows you to use blank cells &lt;code&gt;""&lt;/code&gt;, the letters &lt;code&gt;"NA"&lt;/code&gt;, and the number &lt;code&gt;999&lt;/code&gt; as missing values.&lt;/p&gt;'><sup>3</sup></a>. A complementary rule of thumb is that each column should only contain one type of data, such as words or numbers, not both.</p>

</div>
</div>
</div>

<script>

/* update total correct if #webex-total_correct exists */
update_total_correct = function() {
  console.log("webex: update total_correct");

  if (t = document.getElementById("webex-total_correct")) {
    var correct = document.getElementsByClassName("webex-correct").length;
    var solvemes = document.getElementsByClassName("webex-solveme").length;
    var radiogroups = document.getElementsByClassName("webex-radiogroup").length;
    var selects = document.getElementsByClassName("webex-select").length;
    
    t.innerHTML = correct + " of " + (solvemes + radiogroups + selects) + " correct";
  }
}

/* webex-solution button toggling function */
b_func = function() {
  console.log("webex: toggle hide");
  
  var cl = this.parentElement.classList;
  if (cl.contains('open')) {
    cl.remove("open");
  } else {
    cl.add("open");
  }
}

/* function for checking solveme answers */
solveme_func = function(e) {
  console.log("webex: check solveme");

  var real_answers = JSON.parse(this.dataset.answer);
  var my_answer = this.value;
  var cl = this.classList;
  if (cl.contains("ignorecase")) {
    my_answer = my_answer.toLowerCase();
  }
  if (cl.contains("nospaces")) {
    my_answer = my_answer.replace(/ /g, "")
  }

  if (my_answer == "") {
    cl.remove("webex-correct");
    cl.remove("webex-incorrect");
  } else if (real_answers.includes(my_answer)) {
    cl.add("webex-correct");
    cl.remove("webex-incorrect");
  } else {
    cl.add("webex-incorrect");
    cl.remove("webex-correct");
  }

  // match numeric answers within a specified tolerance
  if(this.dataset.tol > 0){
    var tol = JSON.parse(this.dataset.tol);
    var matches = real_answers.map(x => Math.abs(x - my_answer) < tol)
    if (matches.reduce((a, b) => a + b, 0) > 0) {
      cl.add("webex-correct");
    } else {
      cl.remove("webex-correct");
    }
  }

  // added regex bit
  if (cl.contains("regex")){
    answer_regex = RegExp(real_answers.join("|"))
    if (answer_regex.test(my_answer)) {
      cl.add("webex-correct");
    }
  }

  update_total_correct();
}

/* function for checking select answers */
select_func = function(e) {
  console.log("webex: check select");
  
  var cl = this.classList
  
  /* add style */
  cl.remove("webex-incorrect");
  cl.remove("webex-correct");
  if (this.value == "answer") {
    cl.add("webex-correct");
  } else if (this.value != "blank") {
    cl.add("webex-incorrect");
  }
  
  update_total_correct();
}

/* function for checking radiogroups answers */
radiogroups_func = function(e) {
  console.log("webex: check radiogroups");

  var checked_button = document.querySelector('input[name=' + this.id + ']:checked');
  var cl = checked_button.parentElement.classList;
  var labels = checked_button.parentElement.parentElement.children;
  
  /* get rid of styles */
  for (i = 0; i < labels.length; i++) {
    labels[i].classList.remove("webex-incorrect");
    labels[i].classList.remove("webex-correct");
  }
  
  /* add style */
  if (checked_button.value == "answer") {
    cl.add("webex-correct");
  } else {
    cl.add("webex-incorrect");
  }
  
  update_total_correct();
}

window.onload = function() {
  console.log("onload");
  /* set up solution buttons */
  var buttons = document.getElementsByTagName("button");

  for (var i = 0; i < buttons.length; i++) {
    if (buttons[i].parentElement.classList.contains('webex-solution')) {
      buttons[i].onclick = b_func;
    }
  }

  /* set up webex-solveme inputs */
  var solveme = document.getElementsByClassName("webex-solveme");

  for (var i = 0; i < solveme.length; i++) {
    /* make sure input boxes don't auto-anything */
    solveme[i].setAttribute("autocomplete","off");
    solveme[i].setAttribute("autocorrect", "off");
    solveme[i].setAttribute("autocapitalize", "off");
    solveme[i].setAttribute("spellcheck", "false");
    solveme[i].value = "";

    /* adjust answer for ignorecase or nospaces */
    var cl = solveme[i].classList;
    var real_answer = solveme[i].dataset.answer;
    if (cl.contains("ignorecase")) {
      real_answer = real_answer.toLowerCase();
    }
    if (cl.contains("nospaces")) {
      real_answer = real_answer.replace(/ /g, "");
    }
    solveme[i].dataset.answer = real_answer;

    /* attach checking function */
    solveme[i].onkeyup = solveme_func;
    solveme[i].onchange = solveme_func;
  }
  
  /* set up radiogroups */
  var radiogroups = document.getElementsByClassName("webex-radiogroup");
  for (var i = 0; i < radiogroups.length; i++) {
    radiogroups[i].onchange = radiogroups_func;
  }
  
  /* set up selects */
  var selects = document.getElementsByClassName("webex-select");
  for (var i = 0; i < selects.length; i++) {
    selects[i].onchange = select_func;
  }

  update_total_correct();
}

</script><script>
$( document ).ready(function() {
  var cite = ' ';
  var psyteachr = ' <a href="https://psyteachr.github.io/"><img src="images/logos/psyteachr_logo.png" style="height: 31px; color: white;" alt="psyTeachR: Reproducible Research" /></a>';
  var license = ' <a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/" target="blank"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png"></a>';

  $("footer div.row div:eq(1) p").html(
    psyteachr + license + cite
  );

  function move_sidebar() {
    var w = window.innerWidth;
    if (w < 992) {
      $("#toc").appendTo($("#main-nav"));
    } else {
      $("#toc").appendTo($("div.sidebar-chapter"));
    }
  }
  move_sidebar();
  window.onresize = move_sidebar;
});
</script><div class="chapter-nav">
<div class="prev"><a href="index.html">Overview</a></div>
<div class="next"><a href="getting-started.html"><span class="header-section-number">2</span> Getting Started</a></div>
</div></main><div class="col-md-3 col-lg-2 d-none d-md-block sidebar sidebar-chapter">
    <nav id="toc" data-toggle="toc" aria-label="On this page"><h2>On this page</h2>
      <ul class="nav navbar-nav">
<li><a class="nav-link" href="#introduction"><span class="header-section-number">1</span> Introduction</a></li>
<li><a class="nav-link" href="#why-r-for-data-visualisation"><span class="header-section-number">1.1</span> Why R for data visualisation?</a></li>
<li><a class="nav-link" href="#a-layered-grammar-of-graphics"><span class="header-section-number">1.2</span> A layered grammar of graphics</a></li>
<li><a class="nav-link" href="#tutorial-components"><span class="header-section-number">1.3</span> Tutorial components</a></li>
<li><a class="nav-link" href="#simulated-dataset"><span class="header-section-number">1.4</span> Simulated dataset</a></li>
<li><a class="nav-link" href="#setting-up-r-and-rstudio"><span class="header-section-number">1.5</span> Setting up R and RStudio</a></li>
<li>
<a class="nav-link" href="#preparing-your-data"><span class="header-section-number">1.6</span> Preparing your data</a><ul class="nav navbar-nav">
<li><a class="nav-link" href="#data-format"><span class="header-section-number">1.6.1</span> Data format</a></li>
<li><a class="nav-link" href="#variable-names"><span class="header-section-number">1.6.2</span> Variable names</a></li>
<li><a class="nav-link" href="#data-values"><span class="header-section-number">1.6.3</span> Data values</a></li>
</ul>
</li>
</ul>

      <div class="book-extra">
        <ul class="list-unstyled">
<li><a id="book-source" href="https://github.com/psyteachr/introdataviz/blob/master/book/01-ch1.Rmd">View source <i class="fab fa-github"></i></a></li>
          <li><a id="book-edit" href="https://github.com/psyteachr/introdataviz/edit/master/book/01-ch1.Rmd">Edit this page <i class="fab fa-github"></i></a></li>
        </ul>
</div>
    </nav>
</div>

</div>
</div> <!-- .container -->

<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

  <div class="col-12 col-md-6 mt-3">
    <p>"<strong>Data visualisation using R, for researchers who don’t use R</strong>" was written by Emily Nordmann, Phil McAleer, Wilhelmiina Toivo, Helena Paterson, Lisa DeBruine. It was last built on 2022-05-02.</p>
  </div>

  <div class="col-12 col-md-6 mt-3">
    <p>This book was built by the <a class="text-light" href="https://bookdown.org">bookdown</a> R package.</p>
  </div>

</div></div>
</footer>
</body>
</html>
